Deposition and Extension Approach to Find Longest Common Subsequence for Multiple Sequences
نویسنده
چکیده
The problem of finding the longest common subsequence (LCS) for a set of sequences is a very interesting and challenging problem in computer science. This problem is NPcomplete, but because of its importance, many heuristic algorithms have been proposed, such as Long Run algorithm and Expansion algorithm. However, the performance of many current heuristic algorithms deteriorates fast when the number of sequences and sequence length increase. In this paper, we have proposed a post process heuristic algorithm for the LCS problem, the Deposition and Extension algorithm (DEA). This algorithm first generates common subsequence by the process of sequences deposition, and then extends this common subsequence. The algorithm is proven to generate Common Subsequences (CSs) with guaranteed lengths. The experiments show that the results of DEA algorithm are better than those of Long Run and Expansion algorithm, especially on many long sequences. The algorithm also has superior efficiency both in time and space.
منابع مشابه
A New Index-Based Parallel Algorithm for finding Longest Common Subsequence in Multiple DNA Sequences
This paper presents a new Parallel Algorithm for computing a Longest Common Subsequence in Multiple DNA Sequences. It uses a heuristic approach. Although a lot of research has been carried out to find LCS from the two or more given sequences of Protein, DNA, RNA etc, but not many parallel methods exists for finding LCS from multiple sequences. Normally in existing algorithms the time complexity...
متن کاملDevelopment of Cache Oblivious Based Fast Multiple Longest Common Subsequence Technique(CMLCS) for Biological Sequences Prediction
A biological sequence is a single, continuous molecule of nucleic acid or protein. Classical methods for the Multiple Longest Common Subsequence problem (MLCS) problem are based on dynamic programming. The Multiple Longest Common Subsequence problem (MLCS) is used to find the longest subsequence shared between two or more strings. For over 30 years, significant efforts have been made to find ef...
متن کاملThe Longest Common Subsequence Problem
Algorithms on sequences of symbols have been studied for a long time and now form a fundamental part of computer science. One of the very important problems in analysis of sequences is the longest common subsequence problem. For the general case of an arbitrary number of input sequences, the problem is NP-hard. We describe an approach to solve this problem. This approach is based on constructin...
متن کاملAll Common Subsequences
Time series data abounds in real world problems. Measuring the similarity of time series is a key to solving these problems. One state of the art measure is the longest common subsequence. This measure advocates using the length of the longest common subsequence as an indication of similarity between sequences, but ignores information contained in the second, third, ..., longest subsequences. I...
متن کاملCode Similarity Detection in Multiple Large Source Trees using Token Hashes
The ability to find similarities between two source code bases, or within one code base, has many uses including the detection of student plagiarism, the identification of intellectual property violations and the location of repeated code in a code base amenable to refactoring. Previous structure-metric approaches have used either suffix trees or modified Longest Common Subsequence algorithms t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009